Download A simplified approach to high quality music and sound over IP
Present systems for streaming digital audio between devices connected by internet have been limited by a number of compromises. Because of restricted bandwidth and “best effort” delivery, signal compression of one form or another is typical. Buffering of audio data which is needed to safeguard against delivery uncertainties can cause signal delays of seconds. Audio is in general an unforgiving test of networking, e.g., one data packet arriving too late and we hear it. Trade-offs of signal quality have been necessary to avoid this basic fact and until now, have vied against serious musical uses. Beginning in late 1998, audio applications specifically designed for next-generation networks were initiated that could meet the stringent requirements of professional-quality music streaming. A related experiment was begun to explore the use of audio as a network measurement tool. SoundWIRE (sound waves over the internet from real-time echoes) creates a sonar-like ping to display to the ear qualities of bidirectional connections. Recent experiments have achieved coast-to-coast sustained audio connections whose round trip times are within a factor of 2 of the speed of light. Full-duplex speech over these connections feels comfortable and in an IIR recirculating form that creates echoes like SoundWIRE, users can experience singing into a transcontinental echo chamber. Three simplifications to audio streaming are suggested in this paper: Compression has been eliminated to reduce delay and enhance signal-quality. TCP/IP is used in unidirectional flows for its delivery guarantees and thereby eliminating the need for application software to correct transmission errors. QoS puts bounds on latency and jitter affecting long-haul bidirectional flows.
Download Bayesian Identification of Closely-Spaced Chords from Single-Frame STFT Peaks
Identifying chords and related musical attributes from digital audio has proven a long-standing problem spanning many decades of research. A robust identification may facilitate automatic transcription, semantic indexing, polyphonic source separation and other emerging applications. To this end, we develop a Bayesian inference engine operating on single-frame STFT peaks. Peak likelihoods conditional on pitch component information are evaluated by an MCMC approach accounting for overlapping harmonics as well as undetected/spurious peaks, thus facilitating operation in noisy environments at very low computational cost. Our inference engine evaluates posterior probabilities of musical attributes such as root, chroma (including inversion), octave and tuning, given STFT peak frequency and amplitude observations. The resultant posteriors become highly concentrated around the correct attributes, as demonstrated using 227 ms piano recordings with −10 dB additive white Gaussian noise.